Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs

نویسندگان

  • Xinmin Tian
  • Milind Girkar
  • Aart J. C. Bik
  • Hideki Saito
چکیده

State-of-the-art multiprocessor systems pose several difficulties: (i) the user has to parallelize the existing serial code; (ii) explicitly threaded programs using a thread library are not portable; (iii) writing efficient multi-threaded programs requires intimate knowledge of machine’s architecture and micro-architecture. Thus, well-tuned parallelizing compilers are in high demand to leverage state-of-the-art computer advances of NUMA-based multiprocessors, simultaneous multi-threading processors and chip-multiprocessor systems in response to the performance quest from the highperformance computing community. On the other hand, OpenMP* has emerged as the industry standard parallel programming model. Applications can be parallelized using OpenMP with less effort in a way that is portable across a wide range of multiprocessor systems. In this paper, we present several practical compiler optimization techniques and discuss their effect on the performance of OpenMP programs. We elaborate on the major design considerations in a high performance OpenMP compiler and present experimental data based on the implementation of the optimizations in the Intel® C++ and Fortran compilers. Interactions of the OpenMP transformation with other sequential optimizations in the compiler are discussed. The techniques in this paper have achieved significant performance improvements on the industry standard SPEC* OMPM2001 and SPEC* OMPL2001 benchmarks, and these performance results are presented for Intel® Pentium® and Itanium® processor based systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Use of Hyper-Threading Technology for Multimedia Applications with Intel® OpenMP* Compiler

Processors with Hyper-Threading technology can improve the performance of applications by permitting a single processor to process data as if it were two processors by executing instructions from different threads in parallel rather than serially. However, the potential performance improvement can be only obtained if an application is multithreaded by parallelization techniques. This paper pres...

متن کامل

A framework for an automatic hybrid MPI+OpenMP code generation

Clusters of symmetric multiprocessors (SMPs) are the most currently used architecture for large scale applications and combining MPI and OpenMP models is regarded as a suitable programming model for such architectures. But writing efficient MPI+OpenMP programs requires expertise and performance analysis to determine the best number of processes and threads for the optimal execution for a given ...

متن کامل

Performance Tuning of x86 OpenMP Codes with MAQAO

Failing to find the best optimization sequence for a given application code can lead to compiler generated codes with poor performances or inappropriate code. It is necessary to analyze performances from the assembly generated code to improve over the compilation process. This paper presents a tool for the performance analysis of multithreaded codes (OpenMP programs support at the moment). MAQA...

متن کامل

Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor

Programming a multicore processor is difficult. It is even more difficult if the processor has software-managed memory hierarchy, e.g. the IBM Cyclops-64 (C64). A widely accepted parallel programming solution for multicore processor is OpenMP. Currently, all OpenMP directives are only used to decompose computation code (such as loop iterations, tasks, code sections, etc.). None of them can be u...

متن کامل

Animating the Formalised Semantics of a Java-Like Language

Considerable effort has gone into the techniques of extracting executable code from formal specifications and animating them. We show how to apply these techniques to the large JinjaThreads formalisation. It models a substantial subset of multithreaded Java source and bytecode in Isabelle/HOL and focuses on proofs and modularity whereas code generation was of little concern in its design. Emplo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. J.

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2005